UNSUPERVISED EDGE MAP SCORING: AN STATISTICAL COMPLEXITY 

APPROACH. 



JAVIER GIMENEZ, JORGE MARTINEZ, AND ANA GEORGINA FLESIA 

Abstract. Through the last decades, edge detection algorithms have obtained a great degree of 
sophistication, not being the same with the tools that evaluate their performance. The selection of 
the best possible edge map output for a given image in an unsupervised way, without prior knowledge 
of the real edge structure, is still an image processing open problem. 

In this work we define a method to evaluate the performance of Edge detection (ED) techniques 
without any knowledge of the true edge structure, besides the ED output. The method studies the 
quality of an edge map through a new statistical complexity measure that searches for the balance 
between the edge equilibrium and the edge information estimated from the ED image. In the ED 
context, edge equilibrium refers to perfect shape made with very few edge points, while edge infor- 
mation refers to image structure made with many edge points. In order to measure edge equilibrium, 
a cosine based similarity index is made projecting the image into a family of edge patterns that score 
the continuity and width of edges in fixed size windows of the ED image. The information is measured 
by an edge map entropy function based on the Kolmogorov Smirnov test statistic. The statistical 
complexity measure is thus defined as the product of the similarity index that measures equilibrium 
and the entropy that measure information. 

Our experiments made over selected images of the South Florida and Berkeley databases show 
that the new statistical complexity measure is able to score meaningfully different man made Ground 
Truths, and to select the best edge map from a large set of outputs of six different edge detectors. 
Canny, Sobel, Prewitt, Roberts, Laplacian of Gaussian, and Zerocross, compared with supervised 
selections made with Pratt's Figure of merit (FOM) measure. 



1. Introduction 

In most image processing techniques, the detection and handhng of the edge structure of the 
input image is very important. From object detection to image transmission, the quahty of the edge 
manipulation takes a big part in the success of the operation. Nevertheless, there is no universal 
definition of the notion of edge. In |AP79] . an edge is defined as a local change in luminance or 
discontinuity in the luminance intensity of the image, while in |KR81] it is pointed out that the edge 
concept depends of the type of processing and analysis in which it is involved. 
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Therefore, many researchers have designed Edge Detection algorithms which are nearly optimum 
related to some property or other of the edge structure, but only a few have studied how to measure 
the edge strength and quality of general edge maps. fPPllj . Such measures can be classified by the 
need of a reference map called Ground Truth (GT) (supervised or unsupervised measures) and the 
type of score that they output, quantitative or qualitative. 

Some well known examples of quantitative supervised measures, called also discrepancy measures, 
are Pratt Figure of Merit (FOM) [Pra78j . Kappa index |Coh60] . and Baddeley error measure |Bad92j . 
Without the guide of a Ground Truth, the unsupervised methods that are found in the literature 
look for characteristics of the input edge map, like coherence, continuity, smoothness and good 
continuation, |BB79] [KR81] |Zhu96] |HSSB97] . the empirical bootstrap likelihood of detecting a real 
edge, |CMC97j . or an specific pattern identification, |Ber91] . between others. Bowyer et. al (1999) 
[BKD99j already pointed out that these metrics based on qualitative properties of detected edges 
should be considered as additional secondary metrics, but they should not be the primary measures 
of edge quality. A more detailed discussion on various edge detectors and edge-detection evaluation 
methods can be found in |BKD99j y [PPllj . 

In this paper, we propose a quantitative unsupervised measure that searches for a compromise 
between two extreme values in the space of edge maps: a map with few edge points in a perfect 
shape (equilibrium) and many edge points located randomly (Information). To our knowledge, there 
is no previous work that attempts to define notions of Equilibrium and Information (or Entropy) 
in the space of possible edge maps. We propose as Equilibrium function a discrepancy measure 
that combines edge map projections into a family of edge patterns. Also, in order to measure 
the Information present in the edge map we define a new concept of entropy as a function of the 
Kolmogorov-Smirnov test statistic. The combination of both measures produces a new statistical 
complexity measure that is capable to score an edge map output in a way completely different from 
all other measures in the literature. 

Our work is organized in the following manner. In section [2] we introduce a discrepancy measure 
qcT that will be later the basis of the Equilibrium measure. The measure qcr can be compare with 
the well known Pratt measure to assess the fairness of the scoring. In the section [3] we introduce the 
concept of Equilibrium and Entropy, and define the final statistical complexity measure. We show 
some experiments in sectioiiH] and leave the conclusion and comments for section O 
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2. A COSINE DISCREPANCY MEASURE Qct- 

Our main goal in this paper is to define a new quality measure to score edge maps without 
knowledge of the Ground Truth. We will introduce now a discrepancy measure qgt only as an 
intermediate step in the definition of the final measure C. 

Let / be an image with size N x M, h a edge map associated with /, this is, 6 is a binary image 
with the same size than J, and GT is its Ground Truth. Figure [1] is an example of such images. A 
simple measure of discrepancy between h and GT is the cosine of the angle /3 between them, when 
they are seen as 1-d vectors (concatenating all columns one under another). 



(1) 



Qgt {h) = cos(/3) 



GTn 
\GT\\ \\b 



being (3 the angle between GT andb ,and ||x|| = Vx'^x. 

If there is more than one Ground Truth available, J-" = {GTi, GT2, . . . , GTn}, for a given image, 
the index is defined as the maximum of all scores 



(2) 



qjr (b) = max 



GTTb 



l<i<n \\GT\ 



The Cauchy Schwartz inequality implies that the index is upper bounded by one, being one only 
when the map is optimums, (6 G J-"), Since the edge maps and GT's are binary images, the index is 
lower bounded by 0, being only in the absence of any similarity {b is orthogonal to J-"). 




Figure 1. (a) Original image , (b) Ground Truth , (c) Edge map made with Sobel 
edge detector, qcr = 0.3976. 



3. A STATISTICAL COMPLEXITY MEASURE 



In this section we define a statistical complexity measure that evaluates the performance of Edge 

detection (ED) technique without any knowledge besides the ED output. The measure needs two 
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complementary indexes , an equilibrium index q{b) and an entropy index H{b) that score over the 
space of all edge maps. We will follow the general structure of complexity measures describes in 
|LRMC95] . so we define our statistical complexity measure C as 

(3) C{b) = q{h)H{h) 

We say that an edge map is well balanced (reached equilibrium) if it is structurally simple. In this 
sense, the map in panel (d) of Figure [2] is better balanced than the edge map in panel (c) and in 
turn, the one in (c) is better balanced than the one in (b). In the other hand, we say that a map has 
more information than another if it characterizes better the discontinuities, textures and shapes of 
the analyzed image. The overabundance of information produce chaotic (cluttered) edge maps like 
in (b), and the absence of information produce poor edge maps like in (d). Thus, Equilibrium and 
Information are two complementary concepts, and the product of them, the complexity, searches for 
a balance point between them. In order to measure the equilibrium of an edge map, we modified 




Figure 2. (a) Original Image, (b)-(d) Edge maps outputs of Canny's algorithm with 
parameters: high threshold HT = 0.01, 0.19, 0.99, lower threshold LT = OA* HT; and 
smoothness parameter a = \/2. 

the cosine measure of section replacing the family of GT by a family of edge patterns, that make 
sure the correct identification and value of the usual local characteristics of edges. In the other hand, 
the entropy is a concept that measures the amount of information of a system, which is maximized 
when the system reaches a random state. Thus we quantify the randomness of an edge map with a 
function based on Kolmogorov-Smirnov (KS) goodness of fit statistics, that measures the statistical 
distribution of the edge patterns against the uniform distribution. 

3.1. Equilibrium measure. Abdou and Pratt [?] introduced in its seminal paper the notion of 
figure of merit, in order to score edge patterns that are fragmented, offset and smeared related to 
the ideal edges present the Ground Truth. We want our equilibrium index to do a similar task in the 
unsupervised case, so we will replace the Ground truth with a family B of carefully chosen binary 
edge patterns. 
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Abusing notation, let B = {&i/||&i||, • • • , ^n/||&n||} be the collection of all the edge patterns 

transformed into columns vectors. Sliding a window over the edge map b, centered in each pixel 
position k, we can extract the edge sub-maps and measure the local correlation of the edge map 
b respect to the family B, with: 



(4) 



Qb {b{k)) 



max ,,, , 
i<i<n \\bj\ 



The equilibrium of b, respecto to the family of edges B is the average of the local measures computed 
only on edge pixels k: 

#E ^ #E 



(5) 



k=l " k=l 

wehere E is the set of all edge pixels in the binay edge map b. 



3.1.1. A family of edge patterns. The family B of edge patterns could be very general, but in this 
paper, following the ideas of Kitchen et al. |KR81] . we work only with line-like edge patterns, see 
Figure [31 



H S H 



a a a 



Figure 3. Patrones de 7 x 7 considerados 



The line segment is an essential graphic primitive, so it can be used to construct many other objects. 
Our line patterns are made with an accurate and efficient raster line-generating algorithm made by 
Bresenham |Bre65j . Bresenham showed that his line algorithms provide the best-fit approximations 
to the true lines by minimizing the error (distance) to the true primitive. Beginning with ray traces 
that go though the origin we constructed 140 edge patterns of size 7x7. 

In Figure HI we can observe the value of the measure Equilibrium index (|5]) on different patterns 
that appear in an edge map computed from the Block image. Noisy patterns (b)-(e), reach an index 
value lower than 0.54. The edge pattern (c), (f), (g) and (h) show the performance of the index when 
the edges are closer to line segments. The maps (h)-(k) plot the behavior of in presence of thick 
edges. The maximum value is reached in (h), a pattern of a line of width one pixel. 

4 




Figure 4. (a) Edge map of image Block, {b)-{k) windows of size 7x7 extracted from (a). 

3.2. Information and the Kolmogorov-Smirnov statistic. Whenever we make statistical ob- 
servations, or design and conduct statistical experiments, we seek information. How much can we 
infer from a particular set of statistical observations or experiments about the sampled population? 
Shannon ^Sha48j quantified the information provided by an observation proportionally to how im- 
probable it is. Relating this notion with our edge detection problem, if we have three points aligned 
in an edge map, the probability of having a fourth point next to them is higher than the probabil- 
ity of having a point further away. So, observing a point in place with low probability gives more 
information than observing a point in an expected place. 

We define our notion of information trough a new entropy function, that assess the randomness of 
an edge map trough the Kolmogorov-Smirnov (KS) test of goodness of fit. For a given edge map b, 
we select all edge pixels E and map their positions to unit square [0, 1] x [0, 1] with a suitable 
injective function 0, and test the goodness of fit of such a sample with the uniform U distribution 
on the unit square. 

Let D be Kolmogorov-Smirnov bidimensional statistic defined as by |JPZ97] 

(6) Dib)= sup mx,y)-Fix,y)\ 

where F is the cumulative distribution function of an uniform distributed bidimensional vector, and 
Fh the empirical distribution function of the sample (j){E) given by 

(7) Fb{x, y) = ^ 

We use the efficient algorithm developed in |JPZ97j to compute D{b). The KS statistic takes values 

between and 1, rejecting the uniform hypothesis for values closer to 1, so we define our entropy 
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measure H as 



(8) H{b) = 1 - D{b) 

3.3. Statistical Complexity Measure for edge scoring. We are now ready to define our notion 
of statistical complexity measure C, that will score edge maps with a value between zero and one, 
balancing the information with the equilibrium scores estimated from the edge map b 

(9) Cib) = q^{b)H{b). 

4. Results and Analysis. 
We will explore the behavior of our measure in two main situations: 

(1) Selecting the best map bsD from a set of edge maps Sbd made with different detectors over 
the same image 

bBD = arg max C{b) 

b&SBD 

(2) Selecting the best map bsp over a set of edge maps Sbp made with the same detector, moving 
the parameters in a wide range. 

bsp = arg max C{b) 

We will use selected images of the public dataset of the University of South Florida jBKD99] . This 
image dataset includes two subsets, one of 50 natural images and other with 10 aerial images, all 
of them with their ground truth edge images. Each of the fifty images of the first subset contains 
a single object approximately centered in the image, set against a natural background. The second 
set has images of man made constructions. We will work also with selected images of the Berkeley 
database |MFTMOT| . which have several different ground truth edge images available for each image. 
We will analyze the performance of our measure studying 6 gradient based Edge Detectors, Canny, 
Prewitt, Sobel, Roberts Laplacian of Gaussian and Zerocross, using the Matlab implementation on 
the edge function. 

We will also compute the well known Pratt's FOM discrepancy measure to correlate the values 
with our measure. 



Pratt ^ 



^ 1 + 



4.1. Canny edge detector. In our first example, we work with the well known detector Canny. We 
have computed a set Sbp with 100 edge maps moving Canny's parameters in the following fashion: we 
have fixed the smoothness parameter a = \/2, sampled the hysteresis parameter HT (high threshold) 
100 times from zero to one, and defined the LT parameter (low threshold) as LT=0.4*HT. Over such 
database of edge maps we have computed the equilibrium measure, the entropy measure and the 
complexity measure qjr, H y CE, respectively. As the test images have ground truth available, we 
have computed our cosine discrepancy measure and Pratt's FOM measure. 

fl 

r ^ 

(a) ib) c) id) 

Figure 5. (a) Original image il09, (b) Ground Truth, (c)-(d) Canny's extreme edge 
maps with high threshold HT=0.01, 0.99; low threshold LT=0.4*HT; and a = ^2. 

In Figure [5] we have shown in panel (a) an example image from the South Florida Dataset, named 
109; in panel (b) the Ground Truth available from such database; in panel (c) Canny's edge map 
with IIT=0.01, and in panel (d) Canny's edge map with IIT=0.99. The last two edge maps are 
extreme possibihties in the set Sbp, compared with the GT, since the first has too high sensitivity, 
and many texture details transformed in short edges, and the second has very low sensitivity, with 
almost no edges selected. The other 98 edge maps are comprised in between these two extreme edge 
maps. 

In panel (a) of Figure [6] we show a plot of the equilibrium measure gjr, the entropy measure H and 
the complexity measure C, as a function of the HT values. This is a very interesting plot, because 
the Statistical Complexity measures are a compromise between Equilibrium and Information, and 
the maximum value of C over the range of parameters sintetices such a compromise. In Panel (b) 
of Figure [6] we show a plot of Pratt's FOM measure and our cosine discrepancy measure, over the 
same parameter's range. 

The edge map selected by 

bBp = argmaXbeSBpC{b) 

( shown in panel (a) of Figure [7]), can be compared visually with the edge map selected maximizing 

the supervised measure Pratt (shown in panel (b) of Figure [7]). Also the value of the thresholds can 
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Figure 7. (a) -(b) edge maps obtained with Canny with high threshold HT=0.11,0.14; 
low threshold LT=0.4*HT; and a = ^2 .(c) best edge map selected with C, (d) best 
edge map selected with Pratt's and qcx- 



be compared; in this case, the C measure selected a map with threshold HT = 0.11 and Pratt a map 
with threshold HT = 0.14. 

4.2. Roberts, Prewitt, Laplacian of Gaussian, Canny, Zerocross and Sobel edge detec- 
tors. In this example, we consider other classical gradient based detectors, Roberts, Prewitt, Sobel, 
Laplacian of Gaussian and Zerocross, which are implemented in the function edge in the Image 
Processing Toolbox of Matlab. We selected the image "woods" from the South Florida Database 
and sampled 100 equispaced instances of the parameter space of each detector in order to produce a 
clutter and white map as extremes. In Figure M we show a plot of the C measures related to the set 
of each detector. The maximum value of each of the C curves have the same order of magnitude 
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Figure 8. Complexity measures for all detectors computed over woods image. 



Edge Detector 


Matlab optimal parameters 




H = l-D 


C 


Pratt 


Qgt 


Canny 


T=[0.076;0.19]; a = V2 


0.7472 


0.9646 


0.7208 


0.3061 


0.2876 


Laplacian of Gaussian 


T=0.0076; a = 2 


0.7760 


0.9201 


0.7139 


0.3951 


0.2654 


Prewitt 


T=0.064 


0.7384 


0.9262 


0.6839 


0.4197 


0.3787 


Roberts 


T=0.0052 


0.6515 


0.9368 


0.6104 


0.3649 


0.3348 


Sobel 


T=0.064 


0.7326 


0.9293 


0.6808 


0.4137 


0.3775 


Zerocross 


T=0.0076 


0.7760 


0.9201 


0.7139 


0.3951 


0.2654 


GT 




0.8288 


0.7739 


0.6456 







Table 1 . Table of C scores over all best ED outputs and the Ground Truth of image 
Woods. 



In table 1 we show the scores given to the best of each detector's maps, and the score assigned to 
the GT, and in Figure M we show the maps selected by the measure C. We also score the selected 
ED with qcT and Pratt's measure using the provided GT. All the measures, the two supervised and 
our unsupervised one, considered the maps alike, a result that can be visually corroborated. 



4.3. Several Ground Truth. Our last example was made with a image form the Berkeley Seg- 
mentation Database |MFTMOT] , a benchmark database for boundary detection algorithms that has 
300 images with several hand made segmentations offered as Ground Truth. The level of detail of 
the different GT segmentations is diverse, and it represents the human opinion about what are the 
structural edges of the objects in the images. 

In Figure ITT] we show six different Ground Truth available for the image 86000, Building. The first 
GT is very detailed, so any supervised measure computed with this GT will give high marks to a 
cluttered edge map, but a measure computed using GT in panel (GT5) will certainly score as better 
a map with very few edge points. But which map will receive high marks from a non supervised 
measure as our complexity measure? 
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(e) (/) {9) ih) 

Figure 9. One image all detectors, (a) Original Image Woods from South Florida 
dataset ; (b) GT; (c) Canny; (d) Log; (e) Prewitt; (f) Roberts; (g) Sobel; (h) Zerocross 



Ground Truth 




H= 1-D 


C 


(GT) 


0.7423 


0.8847 


0.6568 


(GTl) 


0.8146 


0.8612 


0.7015 


(GT2) 


0.7730 


0.8292 


0.6410 


(GT3) 


0.7599 


0.7852 


0.5966 


(GT4) 


0.7597 


0.7817 


0.5939 


(GT5) 


0.7778 


0.8040 


0.6253 



Table 2. C scores for all GT available for the image 86000. 



To answer that question, we made different experiments using the set of Ground Truth as ED 
outputs, also with a 100 Canny's outputs computed just like our first experiment. 

We show the scores each GT with our unsupervised measure, and show such values in Table 2. 
The best of all GT was the second most detailed one, which describes all the building characteristics 
without being too cluttered, see Figure [TTJ 

In Figure [??] we can see the C measure computed over 100 realization of Canny's algorithm with 
the same parameter range that our first example. We also show plots of our discrepancy measure 
and Pratt's measure computed using the most detailed ground truth. 

Having several Ground Truth to choose from, we can select the best edge map from the set of 100 
Canny's outputs with Pratt's measure using different Ground Truth maps. In Figure [TTl first row, 
we can see all GT available. In the second row, in panel (a) we show the original image, and next 
to it, in panel (b), the best map selected by our measure C. In panel (c) we show the optimum 
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Figure 10. (a) gjr, E and C vs high threshold (HT), (b) Pratt and g^T vs high 
threshold (HT), computed with the most detailed GT. 



edge map selected using Pratt's measure with the most detailed GT called (GT). That map was also 
selected with the supervised measure qcr using the set of all ground truth available. In panel (d) we 
show the edge map selected by qcr using all GT but the first one. In panel (e) we show the edge map 
using Pratt's measure with (GTS). Visual inspection tell us that the maps selected by Pratts's and 
C are almost identical, when the GT is the most detailed one. But the differences are very striking 
when Pratt is using GTS as ground truth; the map selected has lost the structure of the building. 




Figure 11. Panel (GT) most detailed Ground Truth, (GT1)-(GT5) Ground Truth 
from the Berkeley database. Second row, panel (a) Image 86000, (b) best map selected 
with measure C, (c) best map selected by Pratt's with GT, (d) best map selected by 
qcT with all GT but the first, (e)best map selected by Pratt's with GTS. 



In this example we should point out three conclusions 

• using supervised measures, the degree of details of the GT impacts in the quality of the edge 
map selected. Pratt's measure selects a better map using a detailed ground truth than using 
a less detailed ground truth. 

• when using our supervised measure, using a set of ground truth compensates for the lack of 
details of each GT in the set. 

• measure qct selects edge maps that are as good as the ones selected by Pratt's measure, and 
can accommodate for the use of a whole set of ground truth to select the best edge maps 
moving parameters in a fixed range. 

• measure C selects edge maps that arc as good as the ones selected by Pratt's measure when 
the GT is detailed enough, and selects better maps than Pratt's when Pratt's reference it is 
too sketchy. 

5. Conclusions 

In this paper we introduce new ideas of edge equilibrium and edge information that lead into 
the definition of a new statistical complexity measure for scoring binary maps. To measure edge 
equilibrium, we defined a similarity index projecting the ED image into a family of edge patterns 
that score the continuity and width of edges in fixed size windows of the ED image. We measured the 
information with a entropy function based on the Kolmogorov Smirnov test statistic. The statistical 
complexity measure was thus defined as the product of the similarity index that measures equilibrium 
and the entropy that measure information. 

Our experiments made over selected images of the South Florida and Berkeley databases showed 
that the new statistical complexity measure is able to score meaningfully different man made Ground 
Truths, and to select the best edge map from a large set of outputs of six different edge detectors. 
Canny, Sobel, Prewitt, Roberts, Laplacian of Gaussian, and Zerocross, compared with supervised 
selections made with Pratt's Figure of merit (FOM) measure. 
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